{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 超参数和模型参数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**超参数**：在算法运行前需要决定的参数\n",
    "\n",
    "**模型参数**：算法过程中学习的参数\n",
    "\n",
    "KNN算法没有模型参数\n",
    "    \n",
    "KNN算法中的k是典型的超参数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 寻找好的超参数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 领域知识\n",
    "* 经验数值\n",
    "* 实验探索"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn import datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "digit = datasets.load_digits()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = digit.data\n",
    "y = digit.target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=666)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from encapsulations.KNN import KNNClassifier\n",
    "knn_clf = KNNClassifier(k=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9888888888888889"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "knn_clf.fit(X_train,y_train)\n",
    "knn_clf.score(X_test,y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9888888888888889"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "my_knn_clf = KNeighborsClassifier(n_neighbors=3)\n",
    "my_knn_clf.fit(X_train,y_train)\n",
    "my_knn_clf.score(X_test,y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 寻找最好的K"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "best_k= 4  best_score= 0.9916666666666667\n"
     ]
    }
   ],
   "source": [
    "best_score = 0.0\n",
    "best_k = -1\n",
    "for k in range(1,11):\n",
    "    knn_clf = KNeighborsClassifier(n_neighbors=k)\n",
    "    knn_clf.fit(X_train,y_train)\n",
    "    score = knn_clf.score(X_test,y_test)\n",
    "    if score > best_score:\n",
    "        best_k = k\n",
    "        best_score = score\n",
    "print(\"best_k=\",best_k,\" best_score=\",best_score)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**如果上述程序找到最好的k为10的话，通常需要继续往10以上的超参数再继续寻找，以防更好的k值在边界的外围**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## K均值中的距离问题"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 考虑距离？不考虑距离？\n",
    "\n",
    "\n",
    "\n",
    "**考虑距离有利于解决平票的问题**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "best_method= distance \tbest_k= 4  best_score= 0.9916666666666667\n"
     ]
    }
   ],
   "source": [
    "best_score = 0.0\n",
    "best_k = -1\n",
    "for method in [\"uniform\",\"distance\"]:\n",
    "    for k in range(1,11):\n",
    "        knn_clf = KNeighborsClassifier(n_neighbors=k,weights=method)\n",
    "        knn_clf.fit(X_train,y_train)\n",
    "        score = knn_clf.score(X_test,y_test)\n",
    "        if score > best_score:\n",
    "            best_k = k\n",
    "            best_score = score\n",
    "print(\"best_method=\",method,\"\\tbest_k=\",best_k,\" best_score=\",best_score)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 探索明科夫斯基距离中的P"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当method使用\"uniform\"时不牵扯P这个超参数，当method使用“distance”时，则P默认为2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "best_p= 5 \tbest_k= 3  best_score= 0.9888888888888889\n"
     ]
    }
   ],
   "source": [
    "best_p = -1\n",
    "best_score = 0.0\n",
    "best_k = -1\n",
    "for k in range(1,11):\n",
    "    for p in range(1,6):\n",
    "        knn_clf = KNeighborsClassifier(n_neighbors=k,weights=\"distance\",p=p)\n",
    "        knn_clf.fit(X_train,y_train)\n",
    "        score = knn_clf.score(X_test,y_test)\n",
    "        if score > best_score:\n",
    "            best_k = k\n",
    "            best_score = score\n",
    "            best_p = p\n",
    "print(\"best_p=\",p,\"\\tbest_k=\",best_k,\" best_score=\",best_score)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
